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Abstract 

We present a computer-aided design flow for quantum 
circuits, complete with automatic layout and control 
logic extraction. To motivate automated layout for 
quantum circuits, we investigate grid-based layouts 
and show a performance variance of four times as we 
vary grid structure and initial qubit placement. We 
then propose two polynomial-time design heuristics: 
a greedy algorithm suitable for small, congestion- 
free quantum circuits and a dataflow-based analy- 
sis approach to placement and routing with implicit 
initial placement of qubits. Finally, we show that 
our dataflow-based heuristic generates better layouts 
than the state-of-the-art automated grid-based lay- 
out and scheduling mechanism in terms of latency 
and potential pipelinability, but at the cost of some 
area. 



1 Introduction 

Quantum computing offers us the opportunity to 
solve certain problems thought to be intractable 
on a classical machine. For example, the follow- 
ing classically hard problems benefit from quan- 
tum algorithms: factorization [19] . unsorted database 
search [6; , and simulation of quantum mechanical sys- 
tems [26] . 

In addition to significant work on quantum al- 
gorithms and underlying physics, there have been 
several studies exploring architectural trade-offs for 
quantum computers. Most such research [31 [TB] has 
focused on simulating quantum algorithms on a fixed 
layout rather than on techniques for quantum circuit 
synthesis and layout generation. These studies tend 
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Figure 1: The goal of our CAD flow is to automate 
the laying out of a quantum circuit to generate a 
physical layout, an intelligent initial placement of 
qubits, the associated classical control logic and an- 
notations to help the online scheduler better use the 
layout optimizations as they were intended. This flow 
may then be used recursively to design larger blocks 
using previously created modules. 

to use hand-generated and hand-optimized layouts on 
which efficient scheduling is then performed. While 
this approach is quite informative in a new field, it 
quickly becomes intractable as the size of the circuit 
grows. 

Our goal is to automate most of the tasks involved 
in generating a physical layout and its associated con- 
trol logic from a high-level quantum circuit specifica- 
tion (Figure [J). Our computer-aided design (CAD) 
flow should process a quantum circuit specification 
and produce the following: 

• a physical layout in the desired technology 

• an intelligent initial qubit placement in the lay- 
out 

• classical control circuitry specified in some hard- 
ware description language (HDL), which may 
then be run through a classical CAD flow 



• a set of annotations or "hints" for the online 
scheduler, allowing a tighter coupling of layout 
optimizations to actual runtime operation 

Much like a classical CAD flow, this quantum CAD 
flow is intended to be used hierarchically. We begin 
with a set of technology-specific basic blocks (some 
ion trap technology examples are given in Section gj. 
We then lay out some simple quantum circuits with 
the CAD flow, thus creating custom modules. The 
CAD flow may then be used recursively to create ever 
larger designs. This approach allows us to develop, 
evaluate and reuse design heuristics and avoids both 
the uncertainty and time-intensive nature of hand- 
generated layouts. 

1.1 Motivation for a Quantum CAD 
Flow 

Quantum circuits that are large enough to be "inter- 
esting" require the orchestration of hundreds of thou- 
sands of physical components. In approaching such 
problems, it is important to build upon prior work in 
classical CAD flows. Although the specifics of quan- 
tum technologies (such as are discussed in Section [5]) 
are different from classical CMOS technologies, prior 
work in CAD research can give us insight into how 
to approach the automated layout of quantum gates 
and channels. 

Further, quantum circuits exhibit some interesting 
properties that lend themselves to automatic synthe- 
sis and computer-aided design techniques: 

Quantum ECC Quantum data is extremely frag- 
ile and consequently must remain encoded at all 
times - while being stored, moved, and com- 
puted upon. The encoded version of a circuit 
is often two or three orders of magnitude larger 
than the unencoded version. Further, the ap- 
propriate level of encoding may need to be se- 
lected as part of the layout process in order to 
achieve an appropriate "threshold" of error-free 
execution. Rather than burdening the designer 
with the complexities of adding fault-tolerance 
to a circuit, computer-aided synthesis, design 
and verification can perform such tasks automat- 
ically. 

Ancillae Quantum computations use many helper 
qubits known as ancillae. Ancillae consist of 
bits that are constructed, utilized and recycled 



as part of a computation. Sometimes, ancillae 
are explicit in a designer's view of the circuit. 
Often, however, they should be added automat- 
ically in the process of circuit synthesis, such as 
during the construction of fault-tolerant circuits 
from high-level circuit descriptions. An auto- 
matic design flow can insert appropriate circuits 
to generate and recycle ancillae without involv- 
ing the designer. 

Teleportation Quantum circuits present two pos- 
sibilities for data transport: ballistic movement 
and teleportation. Ballistic movement is rela- 
tively simple over short distances in technologies 
such as ion traps (Section[2|). Teleportation is an 
alternative that utilizes a higher-overhead distri- 
bution network of entangled quantum bits to dis- 
tribute information with lower error over longer 
distances [9] . The choice to employ teleportation 
is ideally done after an initial layout has deter- 
mined long communication paths. Consequently, 
it is a natural target for a computer-aided design 
flow. 

1.2 Contributions 

In this paper, we make the following contributions: 

• We propose a CAD flow for automated design of 
quantum circuits and detail the necessary com- 
ponents of the flow. 

• We describe a technique for automatic synthe- 
sis of the classical control circuitry for a given 
layout. 

• We show that different grid-based architectures, 
which have been the focus of most prior work in 
this field, exhibit vastly varying performance for 
the same circuit. 

• We present heuristics for the placement and 
routing of quantum circuits in ion trap technol- 
ogy 

• We lay out some quantum error correction cir- 
cuits and evaluate the effectiveness of the heuris- 
tics in terms of circuit area and latency. 
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Figure 2: Example library of basic macroblocks. 
Each macroblock has a specific number of ports 
(shown as P0-P3) along with a set of electrodes used 
for ion movement and trapping. Some macroblocks 
contain a trap region where gates may be performed 
(black square). 

1.3 Paper Organization 

The rest of this paper is organized as follows. We 
introduce our chosen technology in Section [51 fol- 
lowed by an overview of prior work in the field in 
Section [3J In Section [H we detail our proposed CAD 
flow and our evaluation metrics. In Section [5l we de- 
scribe the control circuitry interface and scheduling 
protocol that we use in the following sections. Sec- 
tion [6] contains a study of grid-based layouts, which 
have been the basis of most prior work on this sub- 
ject. In Section [7j we present a greedy approach to 
laying out quantum circuits, followed in Section [8] 
by a much more scalable dataflow analysis-based ap- 
proach to layout. Section[9]contains our experimental 
results for all three approaches to layout generation, 
and we conclude in Section [TOl 

2 Ion Traps 

For our initial study, we choose trapped ions [HUT] as 
our substrate technology. Trapped ions have shown 
good potential for scalability [TU]. In this technol- 
ogy, a physical qubit is an ion, and a gate is a loca- 
tion where a trapped ion may be operated upon by a 
modulated laser. 

The ion is both trapped and ballistically moved 
by applying pulse sequences to discrete electrodes 
which line the edges of ion traps. Figure^ shows an 
experimentally-demonstrated layout for a three-way 
intersection [7J . A qubit may be held in place at any 
trap region, or it may be ballistically moved between 
them using the gray electrodes lining the paths. 

Rather than using ion traps as basic blocks, we de- 
fine a library of macroblocks consisting of multiple 
traps for two reasons. First, macroblocks abstract 



out some of the low-level details, insulating our anal- 
yses from variations in the technology implementa- 
tions of ion traps. Details such as which ion species 
is used, specific electrode sizing and geometry (clearly 
variable in the layout in Figure [3^i.) and exact voltage 
levels necessary for trapping and movement are all 
encapsulated within the macroblock. Second, ballis- 
tic movement along a channel requires carefully timed 
application of pulse sequences to electrodes in non- 
adjacent traps. By defining basic blocks consisting of 
a few ion traps, we gain the benefit that crossing an 
interface between basic blocks requires communica- 
tion only between the two blocks involved. 

We use the library of macroblocks shown in Fig- 
ure [2l each of which consists of a 3x3 grid of trap re- 
gions and electrodes, with ports to allow qubit move- 
ment between macroblocks. The black squares are 
gate locations, which may not be performed at inter- 
sections or turns in ion trap technology. Each of these 
macroblocks may be rotated in a layout. This library 
is by no means exhaustive, however it does provide 
the major pieces necessary to construct many physi- 
cal circuits. The macroblocks we present are abstrac- 
tions of experimentally-demonstrated ion trap tech- 
nology 7, 18J. In Figure [3l we show how one can map 
a demonstrated layout (Figure [3^i.) to our macroblock 
abstractions (Figure [3)3 ) . We model this layout as 
a set of StraightChannel and ThreeWaylntersection 
macroblocks. Above the ion trap plane is an array of 
MEMS mirrors which routes laser pulses to the gate 
locations in order to apply quantum gates [11] , as 
shown in Figure [SJ;. 

Some key differences between this quantum circuit 
technology and classical CMOS are as follows: 

• "Wires" in ion traps consist of rectangular chan- 
nels, lined with electrodes, with atomic ions sus- 
pended above the channel regions and moved 
ballistically [13]. Ballistic movement of qubits 
requires synchronized application of voltages on 
channel electrodes to move data around. Thus 
each wire requires movement control circuitry to 
handle any qubit communication. 

• A by-product of the synchronous nature of the 
qubit wire channels is that these circuits can 
be used in a synchronous manner with no ad- 
ditional overhead. This enables some convenient 
pipelining options which will be discussed in Sec- 
tion o 
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Figure 3: a) Experimentally demonstrated physical layout of a T-junction (three-way intersection), b) 
Abstraction of the circuit in (a), built using the StraightChannel and ThreeWay Intersection macroblocks 
shown in Figure [2] c) The ion traps are laid out on a plane, above which is an array of MEMS mirrors used 
to route and split the laser beams that apply quantum gates. 



• Each gate location will likely have the ability 
to perform any operation available in ion trap 
technology. This enables the reuse gate locations 
within a quantum circuit. 

• Scalable ion trap systems will almost certainly 
be two-dimensional due to the difficulty of fab- 
ricating and controlling ion traps in a third di- 
mension [5]. This means that all ion crossings 
must be intersections. 

• Any routing channel may be shared by multi- 
ple ions as long as control circuits prevent multi- 
ion occupancy. Consequently, our circuit model 
resembles a general network, although schedul- 
ing the movement in a general networking model 
adds substantial complexity to our circuit. 

• Movement latency of ions is not only dependent 
on Manhattan distance but also on the geometry 
of the wire channel. Experimentally, it has been 
shown that a right angle turn takes substantially 
longer than a straight channel over the same dis- 
tance [18117]. 

3 Related Work 

Prior research has laid the groundwork for our quan- 
tum circuit CAD flow. Svore et al (22j EH] proposed 
a design flow capable of pushing a quantum program 
down to physical operations. Their work outlined 
various file formats and provided initial implementa- 
tions of some of the necessary tools. Similarly, Balen- 
siefer et al [21 [3] proposed a design flow and compi- 
lation techniques to address fault-tolerance and pro- 
vided some tools to evaluate simple layouts. While 



our CAD flow builds upon some of these ideas, we 
concentrate on automatic layout generation and con- 
trol circuitry extraction. 

Additionally, initial hand-optimized layouts have 
been proposed in the literature. Metodi et al [H] pro- 
posed a uniform Quantum Logic Array architecture, 
which was later extended and improved in [24] . Their 
work concentrated on architectural research and did 
not delve into details of physical layout or scheduling. 
Finally, Metodi et al [16] created a tool to automati- 
cally generate a physical operations schedule given a 
quantum circuit and a fixed grid-based layout struc- 
ture. We extend and improve upon their work by 
adding new scheduling heuristics capable of running 
on grid-based and non-grid-based layouts. 

Maslov et al [13] have recently proposed heuristics 
for the mapping of quantum circuits onto molecules 
used in liquid state NMR quantum computing tech- 
nology. Their algorithm starts with a molecule to be 
used for computation, modeled as a weighted graph 
with edges representing atomic couplings within the 
molecule. The dataflow graph of the circuit is 
mapped onto the molecule graph with an effort to 
minimize overall circuit runtime. Our techniques fo- 
cus on circuit placement and routing in an ion trap 
technology and do not use a predefined physical sub- 
strate topology as in the NMR case. A new ion trap 
geometry is instead generated by our toolset for each 
circuit. 

4 Quantum CAD Flow 

The ultimate goal of a quantum CAD flow is identical 
to that of a standard classical CAD flow: to automate 
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Figure 4: An overview of our CAD flow for quantum 
circuits. Ovals represent files; rectangles represent 
tools. The gray area highlights the portions on which 
we focus in this paper. 



the synthesis and laying out of a circuit. For a quan- 
tum CAD flow, the output circuit consists of both the 
quantum portion and the associated classical control 
logic. 

The quantum CAD flow we present elaborates on 
the design flows described in prior works [21 [551 123] • 
Unlike prior work, our CAD flow addresses the need 
to integrate automatic generation of classical control 
into the flow. Figured] shows an overview of our CAD 
toolset. Rectangles are tools, while ovals represent 
intermediate file formats. Our toolset is built to be 
as similar to classical CAD flows as possible, while 
still accounting for the differences between classical 
and quantum computing described in Section fl.fi 

At the top, we begin with a high-level description 
of the desired quantum circuit. At present this spec- 
ification consists of a sequence of quantum assembly 
language (QASM [3]) instructions implementing the 
desired circuit, since this is a convenient format al- 
ready being used by various third-party tools. We are 
currently investigating extension of this high-level de- 
scription to other formats, such as schematic entry, 



mathematical formulae or a more general high-level 
language. 

The synthesizer parses the QASM file and gener- 
ates a technology-independent netlist stored in XML 
format. From this point onward (downward in the 
figure), all file formats are XML. Additionally, infor- 
mation may be modified or added but generally not 
removed. As we move down the flow, we add more 
and more low level details, but we also keep high- 
level information such as encoded qubit groupings, 
nested layout modules, distinction between ancillac 
and data, etc. This allows low- level tools to make 
more intelligent decisions concerning qubit placement 
and channel needs based on high-level circuit struc- 
ture. It likewise allows logical level modification at 
the lowest levels without having to attempt to deduce 
qubit groupings. 

A technology parameter file specifies the complete 
set of basic blocks available for the layout (see exam- 
ples in Figure[5J, a s well as design rules for connect- 
ing them. A basic block specification contains the 
following: 

• the geometry of the block in enough detail to 
allow fabrication 

• control logic for each operation possible within 
the block (including both movement and gates) 

• control logic for handling each operation possible 
at each interface 

The most basic function of the technology map- 
ping tool is to take a technology-independent netlist 
and map it onto allowed basic blocks to create the 
technology-dependent netlist. This may be more or 
less complicated depending upon the complexity of 
the basic blocks. In addition, it may need to trans- 
late to technology-specific gates (in case the QASM 
file uses gates not available in this technology), en- 
code the qubits used in the circuit (perhaps also auto- 
matically adding the ancilla and operation sequences 
necessary for error correction) and add fault tolerance 
to the final physical circuit. 

In the initial technology-dependent netlist, all 
qubits are physical qubits, meaning that encoding 
levels have been set (though they may still be modi- 
fied later). At this point, any technology-specific op- 
timizations may optionally be applied to the physical 
circuit encapsulated in this netlist. Additionally, if 
the circuit is complex enough to warrant the inclusion 
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of a teleportation-based interconnection network [5] , 
it is added to the netlist here using the higher level 
qubit grouping information in the netlist. 

Once the designer is happy with the netlist, a place- 
ment and routing tool lays out the netlist and adds 
any further channels needed for communication. This 
geometry-aware netlist may be iterated upon as nec- 
essary to refine the layout. Once the layout is final- 
ized, the classical control synthesis tool combines the 
control logic of the various components of the design, 
integrates interface control mechanisms to function 
properly and generates the unified control structure 
for the entire layout. Our control synthesis tool gen- 
erates a Verilog file, which may then be run through 
a classical CAD flow for implementation. 

The layout specification along with the con- 
trol logic file together comprise the geometry-aware 
netlist, which is the end result for the quantum cir- 
cuit initially specified in the high-level description. 
In order to allow hierarchical design of larger quan- 
tum circuits, we may now add this geometry- aware 
netlist to our set of custom modules. Future technol- 
ogy mappings may use both the basic blocks speci- 
fied in the technology parameter file and any custom 
modules we create (or acquire). 

The gray area in Figure [4] identifies the portions 
we shall be focusing on for the rest of this paper. We 
currently process the high-level description (a QASM 
file) directly into a technology-dependent netlist for 
ion traps using the macroblocks shown in Figure O 
Thus we perform a tech mapping, but no automatic 
encoding, interconnect or addition of gates for fault 
tolerance. In this paper, we focus on laying out low- 
level circuits, such as those for encoded ancilla gen- 
eration and error correction. The classical control 
synthesis box of the CAD flow is discussed in Sec- 
tion [5l while placement and routing are analyzed and 
compared in Sections El [8] and [9] 

We use two main metrics to evaluate the perfor- 
mance of our CAD flow: area and latency. For area, 
we consider the bounding box around the layout, so 
irregularly-shaped layouts are penalized (since they 
have wasted space). To determine latency of circuit 
execution, we use the scheduling heuristic described 
in Section 15.21 and extended in Section 18.31 A third 
metric of interest is fault-tolerance. For small layouts 
and circuits, we can use third-party tools to deter- 
mine whether a given layout and schedule is fault- 
tolerant [5], but we do not currently use the fault- 



tolerance metric in our iterative design flow. We use 
area and latency because, to a first approximation, 
lower area and lower latency are likely to decrease de- 
coherence. Previous algorithms to accurately deter- 
mine the error tolerance of a quantum circuit have in- 
volved very computationally-intensive analyses that 
would be inappropriate for circuits with more than 
a few dozen gates pQ. However, we are looking into 
ways to incorporate fault tolerance as a metric. 

5 Control 

The classical control system is responsible for exe- 
cuting the quantum circuit, including deciding where 
and when gate operations occur and tracking and 
managing every qubit in the system. It is composed 
of the following major components: instruction issue 
logic, gate control logic and macroblock control logic. 
Instruction issue logic handles all instruction schedul- 
ing and determines qubit movement paths. Gate con- 
trol logic oversees laser resource arbitration, deciding 
which requested gate operations may occur at any 
given time. The macroblock control logic, which con- 
sists of an individual logic block for each macroblock 
in the system, handles all the internals of the mac- 
roblock, including details of gate operation for each 
gate possible within the macroblock, qubit movement 
within the macroblock and qubit movement into and 
out of the ports. 

5.1 Control Interfaces 

The first step in the control flow involves process- 
ing the quantum circuit's high-level description (the 
QASM file). The instruction issue logic accepts this 
stream of instructions as input and creates a series 
of qubit control messages. Using these qubit control 
messages, macroblock control logic blocks can deter- 
mine where to move qubits and when to execute a 
gate operation. Qubit control messages are simple 
bit streams composed of a qubit ID, along with a se- 
quence of commands, as shown in Figure El When 
a qubit needs to perform an action, the instruction 
issue logic sends to it an appropriate control message 
which travels with the qubit as it traverses the lay- 
out. Once a macroblock receives a qubit and its corre- 
sponding control message, it uses the first command 
in the sequence to determine the operation it must 
perform. The macroblock then removes the com- 
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Figure 5: Example of how a qubit control message is 
constructed to move a qubit through a series of mac- 
roblocks. The qubit enters M0 and travels through 
Ml and M2, arriving at M3 where it is instructed to 
perform a CNOT. 



mand bits used and passes on the remaining control 
message to the next macroblock into which the qubit 
travels. In this manner, the instruction issue logic 
can create a multi-command qubit control message 
that specifies the path a qubit will traverse through 
consecutive macroblocks, along with where gate op- 
erations take place. The instruction issue logic only 
has to transmit this control message to the source 
macroblock, relying on the inter-macroblock commu- 
nication interface to handle the rest. 

Communication between the instruction issue logic 
and the macroblocks takes place using a shared con- 
trol message bus in order to minimize the number 
of wire connections required by the instruction issue 
logic. Each macroblock listens to the control message 
bus for messages addressed to it and only processes 
messages with a destination ID that match the mac- 
roblock's ID. A macroblock is only responsible for 
monitoring the control message bus if it contains a 
qubit that has no remaining command bits. This con- 
dition generally occurs after a gate operation, when 
the instruction issue logic is deciding what action the 
qubit should take next. Once the instruction issue 
logic sends a new control message for the qubit, the 
macroblock resumes operation. 

Macroblocks communicate with each other via con- 
trol signals associated with each quantum port in the 
macroblock. Each port has signals to control qubit 
movement into the macroblock and signals to control 
movement out of the macroblock via that port. These 
signals are connected to the corresponding signals of 
the neighboring macroblocks. The macroblocks as- 



sert a request signal to a destination macroblock 
when a qubit command indicates the qubit should 
cross into the next macroblock. If an available 
signal response is received, the qubit, along with its 
control message, can move across into the neighbor- 
ing macroblock; if not, the qubit must wait until the 
available signal is present. 

The macroblock interface enables the instruction 
issue logic to schedule qubit movement as a path 
through a sequence of macroblocks, without concern- 
ing itself with the low level details of qubit move- 
ment. This modular system allows macroblocks to 
be replaced with any other macroblock that imple- 
ments the defined interface, without modifying the 
instruction issue logic. 

Additionally, macroblocks have an interface to the 
laser control logic. Whenever a macroblock is in- 
structed to perform a gate operation, it must request 
a laser resource through the laser control logic. The 
laser controller is responsible for aggregating requests 
from all the macroblocks in the system, and decid- 
ing when and where to send laser pulses. The laser 
controller also attempts to parallelize as many oper- 
ations as possible. Once the laser pulses have com- 
pleted, the laser controller notifies the macroblocks, 
indicating that the gate operation is complete. 

5.2 Instruction Scheduling 

The instruction issue logic is responsible for deter- 
mining the runtime execution order of the instruc- 
tions in the quantum circuit, which involves both 
preprocessing and online scheduling. The instruc- 
tion sequence is first preprocessed to assign priori- 
ties that will help during scheduling. The sequence is 
traversed from end to beginning, scheduling instruc- 
tions as late as dependencies allow, using realistic 
gate latencies but ignoring movement. Essentially, 
each instruction is labeled with the length of its crit- 
ical path to the end of the program. This is similar 
to the method used in [16], but we use critical path 
with gate times rather than the size of the dependent 
subtree. 

The instruction preprocessing generates an opti- 
mal schedule assuming infinite gates and zero move- 
ment cost. However, we wish to evaluate a layout 
with more realistic characteristics. Our scheduler is 
designed to schedule on an arbitrary graph, but the 
layouts provided to it by the place and route tool are 
in fact planar layouts using only right angles. In ad- 
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dition, the scheduler requires that the qubit initial 
positions be provided as well. 

Our scheduler implements a greedy scheduling 
technique. It keeps the set of instructions which 
have had all their dependencies fulfilled (and thus 
are ready to be executed). It attempts to schedule 
them in priority order. So the highest priority ready 
instruction (according to critical path) is attempted 
first and is thus more likely to get access to the re- 
sources it needs. These contested resources include 
both gates and channels/intersections. Once all pos- 
sible instructions have been scheduled, time advances 
until one or more resources is freed and more instruc- 
tions may be scheduled. This scheduling and stalling 
cycle continues until the full sequence has been exe- 
cuted or until deadlock occurs, in which case it is de- 
tected and the highest priority unscheduled instruc- 
tion at the time of deadlock is reported. 

Since we are interested in evaluating layouts rather 
than in designing an efficient online scheduler, we use 
very thorough searches over the graph in both gate 
assignment and pathfinding. This causes the sched- 
uler to take longer but takes much of the uncertainty 
concerning schedule quality out of our tests. In addi- 
tion, the scheduler reports stalling information which 
may be used for iterating upon the layout. 

5.3 Control Extraction 

Armed with well defined component interfaces and a 
method to execute the quantum instructions, all that 
remains to create the control system for a given quan- 
tum circuit is putting the pieces together. The quan- 
tum datapath is composed of an arbitrary number 
of macroblocks pulled from the component library. 
Each macroblock in our component library has asso- 
ciated with it classical control logic. The control logic 
handles all the internals of the macroblock including 
details of ion movement, ion trapping and gate oper- 
ation. In our library, the macroblock control logic is 
specified using behavioral Verilog modules. 

When the layout stage of the CAD flow creates a 
physical layout of macroblocks, we extract the cor- 
responding control logic blocks and assemble them 
together in a top-level Verilog module for the full 
control system, stitching together all necessary mac- 
roblock interfaces. This module instantiates all the 
appropriate macroblock control modules, along with 
the instruction issue logic and laser controller unit. 
Combined, these modules are assembled into a sin- 
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Figure 6: QPOS grid structure constructed by tiling 
the highlighted 2x2 macroblock cell. 



gle Verilog module which implements the full classi- 
cal control system for the quantum circuit and which 
may be input to a classical CAD flow for synthesis. 



6 Grid-based Layouts 

We begin our exploration of placement and routing 
heuristics by considering grid-based layouts. A ma- 
jority of the work done in the field has concentrated 
on these types of layouts. In all of these works, a lay- 
out is constructed by first designing a primitive cell 
and then tiling this cell into a larger physical layout. 
For example, the authors of [T5l IT6] manually design 
a single cell, and for any given quantum circuit, they 
use that cell to construct an appropriately sized lay- 
out. In [53], the authors automate the generation of 
an H-Tree based layout constructed from a single cell 
pattern. Similarly, [3] uses a cell such as in [23j but 
also provides some tools to evaluate the performance 
of a circuit when the number of communication chan- 
nels and gate locations within the cell is varied. We 
use a combination of these methods to implement a 
tool that automatically creates a grid-based physical 
layout for a given quantum circuit. 

The grid-based physical layouts generated by our 
tools are constructed by first creating a primitive cell 
out of the macroblocks mentioned in Section [2] and 
then tiling the cell to fill up the desired area. For 
example, Figure [|5] shows how a 2 x 2 sized cell can 
be tiled to create the layout used in [TB] (referred to 
henceforth as the QPOS grid). These types of simple 
structures are easy to automatically generate given 
only the number of qubits and gate operations in the 
quantum circuit. Furthermore, grid-based structures 
are very appealing to consider because, apart from 
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[[23,1 ,7]] Golay Encode Grid Search (3x2 cell 
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Figure 7: Variations in runtime of various grid-based 
physical layouts for [[23,1,7]] Golay encode circuit. 
For each grid structure the minimum, mean, and 
maximum time are plotted. 



Figure 8: Comparison of the best 3x2 cell for two dif- 
ferent circuits, (a) The best cell for the [[23, 1, 7]] Go- 
lay encode circuit, (b) The best cell for the [[7, 1, 3]] 
LI correct circuit. 



selecting the number of cells in the layout and the 
initial qubit placement, no other customization is re- 
quired in order to map a quantum circuit onto the 
layout. The regular pattern also makes it easy to de- 
termine how qubits move through the system, as sim- 
ple schemes such as dimension-ordered routing can be 
used. 

The approach we use to generate the grid-based 
layout for a given quantum circuit is as follows: 

1. Given the cell size, create a valid cell structure 
out of macroblocks. 

2. Create a layout by tiling the cell to fill up the 
desired area. 

3. Assign initial qubit locations. 

4. Simulate the quantum circuit on the layout to 
determine the execution time. 

The first step finds a valid cell structure. A cell is 
valid if all the macroblocks that open to the perime- 
ter of the cell have an open macroblock to connect to 
when the cell is tiled. Also, a cell cannot have an iso- 
lated macroblock within it that is unreachable. Once 
we tile this valid cell to create a larger layout, we must 
decide on how to assign initial qubit locations. The 
two methods we utilize are: a systematic left to right, 
one qubit per cell approach, and a randomized place- 
ment. The systematic placement allows us to fairly 
compare different layouts. However, since the initial 



placement of the qubits can affect the performance 
of the circuit, the tool also tries a number of random 
placements in an effort to determine if the systematic 
placement unfairly handicapped the circuit. 

This layout generation and evaluation procedure is 
iterated upon until all valid cell configurations of the 
given size are searched. We then repeat this process 
for different cell sizes. The cell structure that results 
in the minimum simulated time for the circuit is used 
to create the final layout. 

As an example, Figure [7| shows the results of 
searching for the best layout composed of 3 x 2 sized 
cells targeting the [[23, 1, 7]] Golay encode circuit [21], 
one of our benchmarks shown in Table [TJ More than 
900 valid cell configurations were tested. For each 
cell configuration, we try multiple initial qubit place- 
ments (as mentioned earlier) resulting in a range of 
runtimes for each cell configuration. Differences in 
the runtime of the circuit are not limited to just vari- 
ations on the cell configuration but are in fact also 
highly dependent on the initial qubit placement. 

Figure O shows the best cell structure found by 
conducting a search of all 2 x 2, 2 x 3, and 3x2 sized 
cells for two different circuits. The main result of this 
search is that the best cell structure used to create 
the grid-based layout is dependent on what circuit 
will be run upon it. By varying the location of gates 
and communication channels, we tailor the structure 
of the layout to match the circuit requirements. 



9 



While this type of exhaustive search of physical 
layouts is capable of finding an optimal layout for a 
quantum circuit, it suffers from a number of draw- 
backs. Namely, as the size of the cell increases, the 
number of possible cell configurations grows exponen- 
tially. Searching for a good layout for anything but 
the smallest cell sizes is not a realistic option. Fur- 
thermore, while small circuits may be able to take 
advantage of primitive cell based grids, larger cir- 
cuits will require a less homogeneous layout. One 
approach to doing this is to construct a large layout 
out of smaller grid-based pieces, all with different cell 
configurations. While this approach is interesting, we 
feel a more promising approach is one that resembles 
a classical CAD flow, where information extracted 
from the circuit is used to construct the layout. 

7 Greedy Place and Route 

One problem we observed in the regular grid layout 
design was that the high amount of channel conges- 
tion due to limited bandwidth causes densely-packed 
(occupied) gates. Additionally, we found that a num- 
ber of gate locations and channels in many of the 
grids were not even used by the scheduler to perform 
the circuit. 

We present a new heuristic that attempts to solve 
some of these problems. The heuristic is a simple 
greedy algorithm that starts with only as many gate 
locations as qubits (because we assume that qubits 
only rest in storage/gate locations) and no channels 
connecting the gates. It iterates with the circuit 
scheduler, moving and connecting gate locations un- 
til the qubits can communicate sufficiently to perform 
the specified circuit. The current layout is fed into 
the circuit scheduler which tries to schedule until it 
finds qubits in gate locations that cannot communi- 
cate to perform a gate. The place and router then 
connects the problematic gate locations and tries 
scheduling on the layout again. The iteration fin- 
ishes once the circuit can be successfully completed. 
Our algorithm bears some similarity to the iterative 
procedure in adaptive cluster growth placement [12j 
in classical CAD. Gate locations are placed from the 
center outward as the circuit grows to fit a rectilinear 
boundary. 

The placer can move gate locations that have to 
be connected if they are not already connected to 
something else. The router connects gate locations 



by making a direct path in the x and y directions 
between them and placing a new channel, shifting 
existing channels out of the way. Since channels arc 
allowed to overlap, intersections are inserted where 
the new channels cut across existing ones. 

This technique has the advantage that, since the 
circuit scheduler prioritizes gates based on gate delay 
critical path, potentially critical gates are mapped 
to gate locations and connected early in the process. 
Thus critical gates tend to be initially placed close 
together to shorten the circuit critical path. Ad- 
ditionally, gate locations that need to communicate 
can be connected directly instead of using a general 
shared grid channel network, where congestion can 
occur and cause qubits to be routed along unneces- 
sarily long paths. 

A disadvantage of this heuristic is that gate place- 
ment is done to optimize critical path, not to min- 
imize channel intersections. This means that the 
layout could end up having many 4-way channel in- 
tersections and turns, both of which have more de- 
lay than 2-way straight channels. Additionally, even 
though critical gates are mapped and placed near 
each other, the channel routing algorithm tends to 
spread these gate locations apart as more channels 
cut through the center of the circuit. We discuss our 
experimental evaluation of this heuristic in Section[5] 

8 Dataflow-Based Layouts 

As described in Section [5J a systematic row by row 
initial placement for qubits allows us to make some- 
what accurate comparisons between different grid- 
based layouts, while a random initial qubit placement 
allows us to test a single grid's dependence on qubit 
starting positions. However, in laying out a quantum 
circuit, we would like to have a more intelligent and 
natural means of determining initial qubit placement. 
For this, we turn to the dataflow graph representation 
of the circuit. 

8.1 Dataflow Graph Analysis 

Figure [5^, shows a QASM instruction sequence con- 
sisting of Hadamard gates (H) and controlled bit- 
flips (CX) operating on qubits QO, Ql, Q2 and Q3, 
with each instruction labeled by a letter. Figure 
shows the equivalent sequence of operations in stan- 
dard quantum circuit format. Either of these may 



10 



A) 


H 


QO 


B) 


H 


Ql 


C) 


H 


Q2 


D) 


H 


Q3 


E) 


CX 


Q0.Q1 


F) 


CX 


Q2.Q3 


G) 


CX 


Q1.Q2 


H) 


CX 


Q2.Q3 


I) 


CX 


Q0.Q2 




(a) 






Figure 9: a) A QASM instruction sequence, b) A quantum circuit equivalent to the instruction sequence in 
(a), c) A dataflow graph equivalent to the instruction sequence in (a). Each node represents an instruction, 
as labeled in (a). Each arc represents a qubit dependency. 
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Figure 10: a) Each node (instruction) is initialized in its own node group (NG, outlined by the dotted lines), 
which corresponds to a physical gate location in a layout. Once placed, we extract physical distances between 
the nodes (the edge labels), b) We find the longest edge weight on the longest critical path (the length 5 
edge on the path C-F-G-H-I; solid bold arrows) and merge its two node groups to eliminate that latency, 
c) We recompute the critical path (A-E-I; dashed bold arrows) and merge its node groups, and so on. 



be translated into the dataflow graph shown in Fig- 
ure [9fc, where each node represents a QASM instruc- 
tion (as labeled in Figure EK) and each arc represents 
a qubit dependency. With this dataflow graph, we 
may perform some analyses to help us place and route 
a layout for our quantum circuit. 

The general idea is that we shall create node groups 
in the dataflow graph which correspond to distinct 
gate locations that may then be placed and routed 
on a layout. All instructions within a single node 
group are guaranteed to be executed at a single gate 
location, as elaborated upon in Section 18.31 To be- 
gin with, we create a node group for each instruction, 
giving us a dataflow group graph, as shown in Fig- 
ure llOa . If we lay out this group graph with a distinct 
designated gate for each instruction (using heuristics 
discussed in Section f8 . 2 [) . we get a layout in which 
the starting location of each qubit is specified implic- 
itly by its first gate location, so no additional initial 
placement heuristic is needed. 

From this layout we can extract movement latency 



between nodes and label the edges with weights (as in 
Figure fTUk). We now find the longest critical path by 
qubit. The critical path A-E-I of qubit QO has length 
14 (the dashed bold arrows), while the critical path 
C-F-G-H-I of qubit Q2 has length 15 (the solid bold 
arrows). We select the longest edge on the longest 
critical path, which is the edge G-H with weight 5. 
We merge these two node groups to eliminate this la- 
tency, in effect specifying that these two instructions 
should occur at the same gate location (Figure \Wb) . 
We then update the layout and recompute distances. 
Assuming we merged these two node groups to the 
location of H (NG8), then the weight of edge F-G 
changes to 1 (to match the weight of edge F-H) and 
the weight of edge E-G probably changes to 6 (former 
E-G plus former G-H), but the exact change really 
depends on layout decisions. The new critical path 
is now A-E-I, so if we do this again, we merge node 
groups NG5 and NG9 to eliminate the edge of weight 
8, and we get the group graph in Figure [TUb. 

In merging nodes, there is the possibility that two 
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qubit starting locations get merged, complicating the 
assignment of initial placement. For this reason, we 
add a dummy input node for each qubit before its 
first instruction. The merging heuristic doesn't allow 
more than one input node in any single node group, 
so we maintain the benefit of having an intelligent 
initial qubit placement without extra work. 

There is an important trade-off to consider when 
taking this merging approach. A tiled grid layout 
provides plenty of gate location reuse but is un- 
likely to provide any pipelinability without great ef- 
fort. A layout of the group graph in Figure [TUb 
(with each instruction assigned to a distinct gate 
location) provides no gate location reuse at all but 
high potential pipelinability. This raises the ques- 
tion of whether we wish to minimize area and time 
(for critical data qubits), maximize throughput of a 
pipeline (for ancilla generation), or compromise at 
some middle ground where small sets of nearby nodes 
are merged in order to exploit locality while still re- 
taining some pipelinability. We intend to further ex- 
plore this topic in the future. 

8.2 Placement and Routing 

Taking the group graph from the dataflow analysis 
heuristic, the placement algorithm takes advantage 
of the fanout-limited gate output imposed by the No- 
Cloning Theorem [55] to lay out the dataflow-ordered 
gate locations in a roughly rectangular block. We 
adopt a gate array-style design, where gate locations 
are laid out in columns according to the graph, with 
space left between each pair of columns for necessary 
channels. This can lead to wasted space due to a 
linear layout of uneven column sizes, so we may also 
perform a folding operation, wherein a short column 
may be folded in (joined) with the previous column, 
thus filling out the rectangular bounding box of the 
layout as much as possible and decreasing area. The 
columns are then sorted to position gate locations 
that need to be connected roughly horizontal to one 
another. This further minimizes channel distance be- 
tween connected gate locations and reduces the num- 
ber of high-latency turns. 

Once gate locations are placed, we use a grid-based 
model in which we first route local wire channels be- 
tween gate locations that are in adjacent or the same 
columns. These channels tend to be only a few mac- 
roblocks long each. A separate global channel is then 
inserted between each pair of rows and between each 
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Figure 11: The placement and routing portion of our 
CAD flow (shown in Figure |4|) takes a technology- 
dependent netlist and translates it into a geometry- 
aware netlist through an iterative process involving 
dataflow analysis and placement and routing tech- 
niques. 

pair of columns of gate locations. These global chan- 
nels stretch the full length of the layout. There are 
no real routing constraints in our simple model since 
channels are allowed to overlap and turn into 3- or 
4-way intersections. We depend on the dataflow col- 
umn sorting in the placement phase to reduce the 
number of intersections and shared local channels. 
While local channels could technically be used for 
global routing and vice versa, we've found that this 
division in routing tends to divide the traffic and sep- 
arate local from long-distance congestion. 

With these basic placement and routing schemes, 
we may now iterate upon the layout, as shown in Fig- 
ure [11] The technology-dependent netlist is trans- 
lated into a dataflow group graph with a separate 
gate location for each instruction (Figure [POk). This 
group graph is then placed, routed and scheduled to 
get latency and identify the runtime critical path (as 
opposed to the critical path in the group graph, which 
fails to take congestion into account). The longest 
latency move on the runtime critical path (between 
two node groups) is merged into one node group, thus 
eliminating the move since a node group represents 
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a single gate location. This new group graph is then 
placed, routed and scheduled again to find the next 
pair of node groups to merge. 

Once this process has iterated enough times, we 
reach a point where congestion at some heavily 
merged node group is actually hurting the latency 
with each further merge. We alleviate this conges- 
tion by adding storage nodes (essentially gate loca- 
tions that don't actually perform gates) near the con- 
gested node group. This increases the area slightly 
but maintains the locality exploited by the merging 
heuristic. If congestion persists, we halt the algo- 
rithm, back up a few merging steps and output the 
geometry-aware netlist. 

8.3 Annotated Scheduling 

The scheduling heuristic described in Section 15.21 
schedules an arbitrary QASM instruction sequence on 
an arbitrary layout. However, once we have assigned 
instructions in a dataflow graph to node groups (as 
described in Section |8T|) . we wish those instructions 
to be executed at their proper location on any lay- 
out placed and routed from the group graph. To this 
end, we annotate each instruction in the instruction 
sequence with the name of the gate location where 
it must be executed. Additionally, since we have the 
gate locations in advance, we can incorporate move- 
ment in the back-prioritization of the instruction se- 
quence. Thus, the priority assigned to each qubit 
now incorporates both gate latencies and movement 
through an uncongested layout, which gives us a bet- 
ter approximation of each qubit's critical path. We 
use this extended scheduler in our dataflow-based ex- 
periments presented in Section [9j 

9 Results 

We now present our simulation results for the heuris- 
tics described in earlier sections. 

9.1 Benchmarks 

Relatively high error rates of operations in a quantum 
computer necessitate heavy encodings of qubits. As 
such, we focus on encoding circuits (useful for both 
data and ancillae) and error correction circuits to ex- 
periment with circuit layout techniques. We lay out 
a number of error correction and encoding circuits to 





Qubit 


Gate 


Circuit name 


count 


count 


[[7,1,3]] LI encode [2DJ 


7 


21 


[[23,1,7]] LI encode [2T] 


23 


116 


[[7,1,3]] LI correction [1] 


21 


136 


[[7,1,3]] L2 encode [2D] 


49 


245 



Table 1: List of our QECC benchmarks, with quan- 
tum gate count and number of qubits processed in 
the circuit. 



evaluate the effectiveness of the heuristics used in our 
CAD flow in terms of circuit area and latency, as de- 
termined by our scheduler. Our circuit benchmarks 
are shown in TableQ] We use two level 1 (LI) encod- 
ing circuits, a level 2 (L2) recursive encoding circuit 
and a fault-tolerant level 1 correction circuit. 

The idea of the encoding circuits is that they will 
provide a constant stream of encoded ancillae to in- 
teract with encoded data qubit blocks. Thus, for 
these circuits, throughput is a more important mea- 
sure than latency, implying that they would benefit 
greatly from pipelining. Nonetheless, a high latency 
circuit could introduce non-trivial error due to in- 
creased qubit idle time. On the other hand, correc- 
tion circuits are much more latency dependent, since 
they are on the critical path for the processing of data 
qubit blocks. 

9.2 Evaluation 

We have evaluated a variety of layout design heuris- 
tics on the four benchmarks shown in Table [1] The 
results are in Table H "QPOS Grid" refers to 
the best scheduled layout from the literature [16] 
(see Section [6]) . "Optimal Grid" refers to the best 
grid with an area matching the QPOS Grid used 
that was found by the exhaustive search described 
in Section [6] "Greedy" refers to the heuristic de- 
scribed in Section [71 "DF" refers to the dataflow- 
based approach from Section [8] "Non- folded" means 
the dataflow graph is laid out with varying column 
widths; "folded" means the layout has been made 
more rectangular by stacking columns. The num- 
ber of global channels is between each pair of rows 
and columns of gate locations. "Critical combining" 
refers to our dataflow group graph merging heuristic. 

The exhaustive search over grids yields the best 
latency for all benchmarks, which is not surprising. 
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Circuit 


Heuristic 


Latency (/is) 


Area 


\\7 1 311 LI encode 


OPOS Grid 


548.0 


49 




OntiTTiPil OriH 


509.0 


49 




Oroodv rhrinTiol and &r\tp location olaroinoTit 


648.0 


36 
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Folded DF, 2 global channels, critical combining 


712.4 


182 


[[23,1,7]] Golay encode 


QPOS Grid 


2268.0 


575 




Optimal Grid 


1801.0 


575 




Greedy channel and gate location placement 


2457.0 


168 




Non-folded DF, 2 global channels, critical combining 


2169.2 


3880 




Folded DF, 1 global channels, critical combining 


2264.0 


713 




Folded DF, 2 global channels, critical combining 


2248.2 


1394 


[[7,1,3]] LI correction 


QPOS Grid 


1300.0 


1271 




Optimal Grid 


771.0 


1271 




Greedy channel and gate location placement 


1932.0 


756 




Non-folded DF, 2 global channels, critical combining 


999.8 


2378 




Folded DF, 1 global channels, critical combining 


1501.2 


690 




Folded DF, 2 global channels, critical combining 


1121.2 


1496 


[[7,1,3]] L2 encode 


QPOS Grid 


2411.0 


1365 




Optimal Grid 


1367.0 


1365 




Greedy channel and gate location placement 


4791.0 


936 




Non-folded DF, 2 global channels, critical combining 


1582.4 


4087 




Folded DF, 1 global channels, critical combining 


1828.6 


1617 




Folded DF, 2 global channels, critical combining 


1944.8 


3381 



Table 2: Latency results for a variety of ECC circuits with different placement and routing heuristics. 
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This kind of search becomes intractable quickly as 
circuit size grows, and additionally, it is based on the 
unproven assumption that a regular layout pattern 
is the best approach. We include this data point as 
something to keep in mind as a target latency. 

Among the polynomial-time heuristics, we first 
note that no single heuristic is optimal for all four 
benchmarks and that, in fact, no single heuristic op- 
timizes both latency and area for any single circuit. 
Dataflow-based place and route techniques in general 
produce the lowest latency circuits. We find that the 
optimal global channel count per column (1 or 2) de- 
pends on the circuit being laid out. This is an artifact 
of the lack of maturity in our routing methodology. 
We intend to explore more adaptive routing optimiza- 
tion in our ongoing work. 

The dataflow approach and the QPOS Grid tend 
to trade off between latency and area. However, we 
expect that the dataflow approach will show greater 
potential for pipelining, thus allowing us to target cir- 
cuits such as an encoded ancilla generation factory, 
for which throughput is of greater importance than 
latency. We also observe that non-folded dataflow 
layouts are likely to have even greater pipelinability 
than folded ones, but at the likely cost of greater area. 
Although, we should note that the area estimates for 
the non-folded DF-based layouts are in fact overes- 
timates due to our use of a liberal bounding box for 
these calculations. 

We find that the greedy heuristic tends to find 
the best design area-wise, but the latency penalty 
increases with circuit complexity. This is expected, 
as greedy is unable to handle congestion problems, 
so it works best for small circuits where congestion 
is not an issue. It is for the opposite reason that the 
DF heuristics fail on the [[7,1,3]] LI encode. They 
insert too much complexity into an otherwise simple 
problem. 

10 Conclusion 

We presented a computer-aided design flow for the 
layout, scheduling and control of ion trap-based 
quantum circuits. We focused on physical quantum 
circuits, that is, ones for which all ancillae, encod- 
ings and interconnect are explicitly specified. We 
explored several mechanisms for generating optimal 
layouts and schedules for our benchmark circuits. 
Prior work has tended to assume a specific regular 



grid structure and to schedule operations within this 
structure. We investigated a variety of grid structures 
and showed a performance variance of a factor of four 
as we varied grid structure and initial qubit place- 
ment. Since exhaustive search is clearly impractical 
for large circuits, we also explored two polynomial- 
time heuristics for automated layout design. Our 
greedy algorithm produces good results for very sim- 
ple circuits, but quickly begins to be suboptimal as 
circuit size grows. For larger circuits, we investigated 
a dataflow-based analysis of the quantum circuit to 
assist a place and route mechanism which leverages 
from classical algorithms. We found that our our 
dataflow approach generally offers the best latency, 
often at the cost of area. However, we expect that a 
layout based on the dataflow graph analysis also of- 
fers better potential for pipelining than a grid-based 
approach, and we intend to investigate this further in 
the future. 
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